Factor Analysis Back Ends for MLLR Transforms in Speaker Recognition
نویسندگان
چکیده
The purpose of this work is to show how recent developments in cepstral-based systems for speaker recognition can be leveraged for the use of Maximum Likelihood Linear Regression (MLLR) transforms. Speaker recognition systems based on MLLR transforms have shown to be greatly beneficial in combination with standard systems, but most of the advances in speaker modeling techniques have been implemented for cepstral features. We show how these advances, based on Factor Analysis, such as eigenchannel and ivector, can be easily employed to achieve very high accuracy. We show that they outperform the current state-of-the-art MLLR-SVM system that SRI submitted during the NIST SRE 2010 evaluation. The advantages of leveraging the new approaches are manyfold: the ability to process a large amount of data, working in a reduced dimensional space, importing any advances made for cepstral systems to the MLLR features, and the potential for system combination at the ivector level.
منابع مشابه
Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملMLLR techniques for speaker recognition
Maximum-Likelihood Linear Regression (MLLR) and Constrained MLLR (CMLLR) have been recently used for feature extraction in speaker recognition. These systems use (C)MLLR transforms as features that are modeled with Support Vector Machines (SVM). This paper evaluates and compares several of these approaches for the NIST Speaker Recognition task. Single CMLLR and up to 4-phonetic-class MLLR trans...
متن کاملMLLR transforms as features in speaker recognition
We explore the use of adaptation transforms employed in speech recognition systems as features for speaker recognition. This approach is attractive because, unlike standard framebased cepstral speaker recognition models, it normalizes for the choice of spoken words in text-independent speaker verification. Affine transforms are computed for the Gaussian means of the acoustic models used in a re...
متن کاملImproving Language Recognition with Multilingual Phone Recognition and Speaker Adaptation Transforms
We investigate a variety of methods for improving language recognition accuracy based on techniques in speech recognition, and in some cases borrowed from speaker recognition. First, we look at the question of language-dependent versus language-independent phone recognition for phonotactic (PRLM) language recognizers, and find that language-independent recognizers give superior performance in b...
متن کامل